ML crash course - Logistic regression

2024-10-22 (modified: 2024-10-27)

Machine learning crash course 중 Logistic regression 챕터.

developers.google.com/machine-learning/crash-course/logistic-regression

Introduction

Learning Objectives:

Identify use cases for performing logistic regression.
Explain how logistic regression models use the sigmoid function to calculate probability.
Compare linear regression and logistic regression.
Explain why logistic regression uses log loss instead of squared loss.
Explain the importance of regularization when training logistic regression models.

Prerequisites:

Calculating a probability with the sigmoid function

This module focuses on using logistic regression model output as-is. In the Classification module, you’ll learn how to convert this output into a binary category.

Sigmoid function

The standard logistic function, also known as the sigmoid function (sigmoid means “s-shaped”), has the formula:

f(x) = \frac{1}{1 + e^{-x}}

Linear regression from/to logistic regression

You can pass the linear regression prediction into the sigmoid function to obtain the logistic regression prediction.

Output of linear regression is referred to as the log odds because if you solve the sigmoid function for $x$ , then $x$ is defined as the log of the ratio of the probabilities of two possible outcompues: $y$ and $1 - y$ :

x = log(\frac{y}{1 - y})

Key terms

Loss and regularization

Logistic regression models are Training|trained using the same process as linear regression models, with two key distinctions:

Logistic regression models use log loss as the loss function instead of squared loss.
Applying regularization is critical to prevent overfitting.

Log Loss

Squared loss works well for a linear regression where the rate of change of the output values is constant. However, the rate of change of a logistic regression model is not constant.

If you used squared loss to calculate errors for the sigmoid function, as the output got closer and closer to 0 and 1, you would need more memory to preserve the precision needed to track these values.

Instead, the loss function for logistic regression is log loss. The Log Loss equation returns the logarithm of the magnitude of the change, rather than just the distance from data to prediction.

Regularization in logistic regression

Regularization, a mechanism for penalizing model complexity during training, is extremely important in logistic regression modeling. Without regularization, the asymptotic nature of logistic regression would keep driving loss towards 0 in cases where the model has a large number of features. Consequently, most logistic regression models use one of the following two strategies to decrease model complexity:

Key terms

What’s next

ML crash course - Classification